Skip to content
This repository was archived by the owner on Sep 17, 2024. It is now read-only.

chore: add APM integration to CI#1148

Merged
mdelapenya merged 22 commits intoelastic:masterfrom
mdelapenya:apm-server-tests
Jul 28, 2021
Merged

chore: add APM integration to CI#1148
mdelapenya merged 22 commits intoelastic:masterfrom
mdelapenya:apm-server-tests

Conversation

@mdelapenya
Copy link
Copy Markdown
Contributor

@mdelapenya mdelapenya commented May 6, 2021

What does this PR do?

It adds a parallel execution running the tests for the APM Integration. Besides that, because this PR was outdated with the codebase, I'm doing the following tasks:

  • update the test scenarios with latest versions of the steps
  • remove fleet server bootstrapping from the stand-alone scenarios, as it's already provided by the runtime dependencies (starting the Fleet test suite implies starting the stack + fleet-server).
  • decouple operations for Fleet integrations into a separate feature file. This way we will be able to differentiate potential errors related to CRUD operations in the integrations from errors. For that, we are moving the operations to the new feature file, removing them from the original scenarios. Therefore, these scenarios will be simpler, testing one specific behaviour.
  • minor refactors related to removing leftovers or renaming scenarios for consistency
  • properly enroll standalone agent into Fleet (cc/ @EricDavisX @adam-stokes @michalpristas): we discovered that the standalone agent never enrolled in Fleet, and the the stand-alone agent is listed in Fleet as "online" step was getting the online status for the wrong agent (the first agent in the list, which was the already bootstrapped fleet-server agent). For that reason we are getting the hostname of current agent and checking that it appears in the response of the agents list. If so, we verify that it's in the online status. The auto-enrollment has been implemented with ce45452, where the proper env vars have been declared and populated so the agent enrolls in Fleet using the already bootstrapped fleet-server.

Why is it important?

We were not running those tests on CI

Checklist

  • My code follows the style guidelines of this project
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have made corresponding change to the default configuration files
  • I have added tests that prove my fix is effective or that my feature works
  • I have run the Unit tests for the CLI, and they are passing locally
  • I have run the End-2-End tests for the suite I'm working on, and they are passing locally
  • I have noticed new Go dependencies (run make notice in the proper directory)

Author's Checklist

  • @jalvz, we are running the tests in parallel on CI, can you confirm that this is what we want for those tests?
  • @elastic/apm-server as discussed offline, we'd need your help in fixing these tests.

How to test this PR locally

TAGS="apm_server && install" TIMEOUT_FACTOR=5 LOG_LEVEL=TRACE DEVELOPER_MODE=true ELASTIC_APM_ACTIVE=false make -C e2e/_suites/fleet functional-test

Related issues

Backports

Backport to 7.x, and check if 7.14.x and 7.13.x are required

@mdelapenya mdelapenya self-assigned this May 6, 2021
@mdelapenya mdelapenya requested review from a team and jalvz May 6, 2021 09:48
@mdelapenya mdelapenya marked this pull request as ready for review May 6, 2021 09:48
@elasticmachine
Copy link
Copy Markdown
Contributor

elasticmachine commented May 6, 2021

💚 Build Succeeded

the below badges are clickable and redirect to their specific view in the CI or DOCS
Pipeline View Test View Changes Artifacts preview preview

Expand to view the summary

Build stats

  • Start Time: 2021-07-28T14:53:56.404+0000

  • Duration: 37 min 43 sec

  • Commit: 82393c3

Test stats 🧪

Test Results
Failed 0
Passed 240
Skipped 0
Total 240

Trends 🧪

Image of Build Times

Image of Tests

💚 Flaky test report

Tests succeeded.

Expand to view the summary

Test stats 🧪

Test Results
Failed 0
Passed 240
Skipped 0
Total 240

@mdelapenya
Copy link
Copy Markdown
Contributor Author

Ok the APM tests are failing on CI. @jalvz could you take a look before merging this PR?

Copy link
Copy Markdown
Contributor

@jalvz jalvz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh i didn't know new tests did not run on CI... do we need also the cloud tag?

@mdelapenya
Copy link
Copy Markdown
Contributor Author

oh i didn't know new tests did not run on CI... do we need also the cloud tag?

No, because the @apm_server tag defines the scope for the entire feature file. We usually use that feature tag for splitting scenarios on CI, using the descriptor modified in this PR

@adam-stokes
Copy link
Copy Markdown
Contributor

/test

@mdelapenya
Copy link
Copy Markdown
Contributor Author

@adam-stokes I'm currently working in this branch, as there are some outdated scenarios that need rework

@mdelapenya
Copy link
Copy Markdown
Contributor Author

@jalvz sorry for taking that much in pushing this forward. I've just returned from vacations and would like to merge this ASAP.

I've found one issue with the current implementation of the aStandaloneAgentIsDeployed test method: it never achieves the apm-server process to be spun up in the agent. On the contrary, mounting the config files for "apm-legacy" as a volume (exactly the same as in the cloud scenario), then the process appears in the agent container. And that effect would lead to having exactly two equal scenarios.

Do you know why?

@mdelapenya
Copy link
Copy Markdown
Contributor Author

Updated the description and pinged @elastic/apm-server team so that they are aware of how to reproduce this failure for the APM integration. As mentioned on Slack, they are testing the Fleet integration on every APM Server's PR, so they pointed out to a possible problem in tests' setup.

@mdelapenya mdelapenya added area:test Anything related to the Test automation priority:medium Important work, but not urgent or blocking. requested-by:APM Requested by the APM team size:M 1-5 days Team:Automation Label for the Observability productivity team triaged Triaged issues will end up in Backlog column in Robots GH Project labels Jul 27, 2021
@mdelapenya
Copy link
Copy Markdown
Contributor Author

Thousand thank yous to @michalpristas and @axw for the quick chats we had to unblock this.

image

* master:
  fix(ci): wait for downstream jobs on scheduled jobs (elastic#1393)
  Fix for the OS field and scenarios (elastic#1349)
  Add test case for add_kubernetes_metadata with autodiscover (elastic#1385)
  chore(ci): propagate downstream build result to upstream (elastic#1377)
Scenario Outline: Starting the <image> agent starts backend processes
When a "<image>" stand-alone agent is deployed
Then there are "1" instances of the "filebeat" process in the "started" state
Then there are "2" instances of the "filebeat" process in the "started" state
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was an error hidden by the non-autoenrolling stand-alone agent.

@mdelapenya mdelapenya merged commit bc65335 into elastic:master Jul 28, 2021
mergify Bot pushed a commit that referenced this pull request Jul 28, 2021
* chore: add APM integration to CI

* fix: update scenario step to latest version

* chore: remove blank lines

* fix: remove fleet server from the stand-alone agent

* feat: add an scenario for adding integrations

* chore: simplify scenarios avoiding testing twice

The installation of the integration to the policy is already tested in another scenario

* chore: rename scenario

* chore: add integrations feature file to the CI

* chore: remove references to FleetServerPolicy, as it's not used anymore

* fix: bring fleet-server boostrap test back

* fix: expose cloud agent in a not used port

* chore: extract a method to get Fleet Server URL

* fix: get stand-alone agent by hostname from agents list

We were getting the 1st agent, and because we have an agent bootstrapped
as fleet-server, it was retrieved as the first one, causing that the
"is agent 'online'" step always returned true, instead of returning the
status of the newly deployed agent.

* fix: automatically enroll the stand-alone agent in Fleet

* fix: reduce the number of occurrences

* fix: there are 2 filebeat instances

* chore: move cloud configs to a better place

* chore: run APM tests with ubi8 base image

* fix: keep original structure

* fix: right volume path

* chore: remove cloud scenario for APM integration

(cherry picked from commit bc65335)
mergify Bot pushed a commit that referenced this pull request Jul 28, 2021
* chore: add APM integration to CI

* fix: update scenario step to latest version

* chore: remove blank lines

* fix: remove fleet server from the stand-alone agent

* feat: add an scenario for adding integrations

* chore: simplify scenarios avoiding testing twice

The installation of the integration to the policy is already tested in another scenario

* chore: rename scenario

* chore: add integrations feature file to the CI

* chore: remove references to FleetServerPolicy, as it's not used anymore

* fix: bring fleet-server boostrap test back

* fix: expose cloud agent in a not used port

* chore: extract a method to get Fleet Server URL

* fix: get stand-alone agent by hostname from agents list

We were getting the 1st agent, and because we have an agent bootstrapped
as fleet-server, it was retrieved as the first one, causing that the
"is agent 'online'" step always returned true, instead of returning the
status of the newly deployed agent.

* fix: automatically enroll the stand-alone agent in Fleet

* fix: reduce the number of occurrences

* fix: there are 2 filebeat instances

* chore: move cloud configs to a better place

* chore: run APM tests with ubi8 base image

* fix: keep original structure

* fix: right volume path

* chore: remove cloud scenario for APM integration

(cherry picked from commit bc65335)
mergify Bot pushed a commit that referenced this pull request Jul 28, 2021
* chore: add APM integration to CI

* fix: update scenario step to latest version

* chore: remove blank lines

* fix: remove fleet server from the stand-alone agent

* feat: add an scenario for adding integrations

* chore: simplify scenarios avoiding testing twice

The installation of the integration to the policy is already tested in another scenario

* chore: rename scenario

* chore: add integrations feature file to the CI

* chore: remove references to FleetServerPolicy, as it's not used anymore

* fix: bring fleet-server boostrap test back

* fix: expose cloud agent in a not used port

* chore: extract a method to get Fleet Server URL

* fix: get stand-alone agent by hostname from agents list

We were getting the 1st agent, and because we have an agent bootstrapped
as fleet-server, it was retrieved as the first one, causing that the
"is agent 'online'" step always returned true, instead of returning the
status of the newly deployed agent.

* fix: automatically enroll the stand-alone agent in Fleet

* fix: reduce the number of occurrences

* fix: there are 2 filebeat instances

* chore: move cloud configs to a better place

* chore: run APM tests with ubi8 base image

* fix: keep original structure

* fix: right volume path

* chore: remove cloud scenario for APM integration

(cherry picked from commit bc65335)
mdelapenya added a commit that referenced this pull request Jul 28, 2021
* chore: add APM integration to CI

* fix: update scenario step to latest version

* chore: remove blank lines

* fix: remove fleet server from the stand-alone agent

* feat: add an scenario for adding integrations

* chore: simplify scenarios avoiding testing twice

The installation of the integration to the policy is already tested in another scenario

* chore: rename scenario

* chore: add integrations feature file to the CI

* chore: remove references to FleetServerPolicy, as it's not used anymore

* fix: bring fleet-server boostrap test back

* fix: expose cloud agent in a not used port

* chore: extract a method to get Fleet Server URL

* fix: get stand-alone agent by hostname from agents list

We were getting the 1st agent, and because we have an agent bootstrapped
as fleet-server, it was retrieved as the first one, causing that the
"is agent 'online'" step always returned true, instead of returning the
status of the newly deployed agent.

* fix: automatically enroll the stand-alone agent in Fleet

* fix: reduce the number of occurrences

* fix: there are 2 filebeat instances

* chore: move cloud configs to a better place

* chore: run APM tests with ubi8 base image

* fix: keep original structure

* fix: right volume path

* chore: remove cloud scenario for APM integration

(cherry picked from commit bc65335)

Co-authored-by: Manuel de la Peña <mdelapenya@gmail.com>
mdelapenya added a commit that referenced this pull request Jul 28, 2021
* chore: add APM integration to CI

* fix: update scenario step to latest version

* chore: remove blank lines

* fix: remove fleet server from the stand-alone agent

* feat: add an scenario for adding integrations

* chore: simplify scenarios avoiding testing twice

The installation of the integration to the policy is already tested in another scenario

* chore: rename scenario

* chore: add integrations feature file to the CI

* chore: remove references to FleetServerPolicy, as it's not used anymore

* fix: bring fleet-server boostrap test back

* fix: expose cloud agent in a not used port

* chore: extract a method to get Fleet Server URL

* fix: get stand-alone agent by hostname from agents list

We were getting the 1st agent, and because we have an agent bootstrapped
as fleet-server, it was retrieved as the first one, causing that the
"is agent 'online'" step always returned true, instead of returning the
status of the newly deployed agent.

* fix: automatically enroll the stand-alone agent in Fleet

* fix: reduce the number of occurrences

* fix: there are 2 filebeat instances

* chore: move cloud configs to a better place

* chore: run APM tests with ubi8 base image

* fix: keep original structure

* fix: right volume path

* chore: remove cloud scenario for APM integration

(cherry picked from commit bc65335)

Co-authored-by: Manuel de la Peña <mdelapenya@gmail.com>
mdelapenya added a commit that referenced this pull request Jul 28, 2021
* chore: add APM integration to CI

* fix: update scenario step to latest version

* chore: remove blank lines

* fix: remove fleet server from the stand-alone agent

* feat: add an scenario for adding integrations

* chore: simplify scenarios avoiding testing twice

The installation of the integration to the policy is already tested in another scenario

* chore: rename scenario

* chore: add integrations feature file to the CI

* chore: remove references to FleetServerPolicy, as it's not used anymore

* fix: bring fleet-server boostrap test back

* fix: expose cloud agent in a not used port

* chore: extract a method to get Fleet Server URL

* fix: get stand-alone agent by hostname from agents list

We were getting the 1st agent, and because we have an agent bootstrapped
as fleet-server, it was retrieved as the first one, causing that the
"is agent 'online'" step always returned true, instead of returning the
status of the newly deployed agent.

* fix: automatically enroll the stand-alone agent in Fleet

* fix: reduce the number of occurrences

* fix: there are 2 filebeat instances

* chore: move cloud configs to a better place

* chore: run APM tests with ubi8 base image

* fix: keep original structure

* fix: right volume path

* chore: remove cloud scenario for APM integration

(cherry picked from commit bc65335)

Co-authored-by: Manuel de la Peña <mdelapenya@gmail.com>
@mdelapenya mdelapenya deleted the apm-server-tests branch July 30, 2021 12:07
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

area:test Anything related to the Test automation priority:medium Important work, but not urgent or blocking. requested-by:APM Requested by the APM team size:M 1-5 days Team:Automation Label for the Observability productivity team triaged Triaged issues will end up in Backlog column in Robots GH Project v7.13.0 v7.14.0 v7.15.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants